Update MG negative sampling to return random samples distributed as specified #4885

ChuckHastings · 2025-01-23T00:52:11Z

Modifies the new negative sampling interface so that when called from MG, each rank specifies how many samples they wish to receive, and to randomly distribute the samples across the calling GPUs.

Marked breaking as it changes the C++ interface... although nothing uses it yet.

Closes #4672

…mber of samples requested

copy-pr-bot · 2025-01-23T00:52:15Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

seunghwak

Shouldn't we update the documentation?

ChuckHastings · 2025-01-24T21:10:27Z

Shouldn't we update the documentation?

Just pushed an update to the documentation.

alexbarghi-nv · 2025-01-27T17:50:07Z

Can we hold this PR until I have a corresponding one in cuGraph-GNN?

seunghwak · 2025-01-28T18:19:18Z

cpp/src/sampling/negative_sampling_impl.cuh

+
+  if constexpr (multi_gpu) {
+    samples_per_gpu = host_scalar_allgather(handle.get_comms(), num_samples, handle.get_stream());
+    handle.sync_stream();


This sync_stream call is unnecessary.

https://github.com/rapidsai/cugraph/blob/branch-25.02/cpp/include/cugraph/utilities/host_scalar_comm.hpp#L263

Fixed in next push.

seunghwak · 2025-01-28T19:08:20Z

cpp/src/sampling/negative_sampling_impl.cuh

+      while (reduction > 0) {
+        size_t est_reduction_per_gpu = (reduction + comm_size - 1) / comm_size;
+        for (size_t i = 0; i < samples_per_gpu.size(); ++i) {
+          if (samples_per_gpu[i] > est_reduction_per_gpu) {
+            samples_per_gpu[i] -= est_reduction_per_gpu;
+            reduction -= est_reduction_per_gpu;
+          } else {
+            reduction -= samples_per_gpu[i];
+            samples_per_gpu[i] = 0;
+          }


I think this logic has a flaw.

Say reduction = 3 & comm_size = 2.
Then, est_reduction_per_gpu = 2
If samples_per_gpu[] > 2, reduction = 3 - 2 - 2 = overflow!

This is (I hope) an uncommon case, usually I believe it is called with exact_number_of_samples == true. The code's probably not optimal... I was mostly looking for functional.

The next line should correct the edge condition you're concerned about. When i is 0, reduction = 3, comm_size = 2, so set_reduction_per_gpu = 2. We'll take the first branch and reduce reduction to 1. The next line will then trigger and reduce set_reduction_per_gpu to 1.

reduce set_reduction_per_gpu to 1.

You mean est_reduction_per_gpu? This is set before the for loop. Once it is set to 2 in line 460, it will be 2 till the end of the for loop in line 416. Am I missing something?

Oh, sorry, I missed the line 470.

seunghwak · 2025-01-28T19:11:22Z

cpp/src/sampling/negative_sampling_impl.cuh

+    rmm::device_uvector<int> reduced_ranks(comm_size, handle.get_stream());
+    rmm::device_uvector<size_t> reduced_counts(comm_size, handle.get_stream());
+
+    reduced_ranks.resize(
+      thrust::distance(reduced_ranks.begin(),
+                       thrust::reduce_by_key(handle.get_thrust_policy(),
+                                             gpu_assignment.begin(),
+                                             gpu_assignment.end(),
+                                             thrust::make_constant_iterator(size_t{1}),
+                                             reduced_ranks.begin(),
+                                             reduced_counts.begin(),
+                                             thrust::equal_to<int>())
+                         .first),
+      handle.get_stream());
+    reduced_counts.resize(reduced_ranks.size(), handle.get_stream());
+
+    rmm::device_uvector<size_t> send_count(comm_size, handle.get_stream());
+    thrust::fill(handle.get_thrust_policy(), send_count.begin(), send_count.end(), 0);
+    thrust::scatter(handle.get_thrust_policy(),
+                    reduced_counts.begin(),
+                    reduced_counts.end(),
+                    reduced_ranks.begin(),
+                    send_count.begin());


gpu_assignment is already sorted.

We can use thrust::upper_bound to find boundaries. This will require just comm_size binary searches and will be much faster.

Fixed in the next push.

seunghwak

LGTM

ChuckHastings · 2025-01-29T21:35:30Z

/merge

alexbarghi-nv · 2025-01-29T22:38:56Z

@ChuckHastings doesn't this affect negative sampling in the C API? So once this merges, I'll have to update cugraph-pyg to call this function with the number of samples on each GPU.

ChuckHastings · 2025-01-29T23:43:10Z

@ChuckHastings doesn't this affect negative sampling in the C API? So once this merges, I'll have to update cugraph-pyg to call this function with the number of samples on each GPU.

This changes the semantic meaning of the parameter. Prior to this change, if you wanted this to run against an MG graph you would specify the total number of samples across all GPUs as the parameter for num_samples on each GPU (the same value on each GPU). With this change, each GPU should specify the number of samples that they want.

So, depending on what you implemented in cugraph-pyg, you may need to change the value that's passed for num_samples, yes.

Adds a heterogeneous link prediction example for cuGraph-PyG that uses the Taobao dataset. Loosely based on the Taobao example from the PyG repository. Adds ability to specify fanout as a dictionary to better align with PyG API. Fixes a bug where the number of negative samples was calculated incorrectly, causing additional unwanted negative samples to be generated. Updates the negative sampling call to match the new behavior added in rapidsai/cugraph#4885 Merge after rapidsai/cugraph#4898 Authors: - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Tingyu Wang (https://github.com/tingyu66) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #104

ChuckHastings added 2 commits January 22, 2025 15:50

add support in MG negative sampling to return the caller the exact nu…

2dff499

…mber of samples requested

Merge branch 'branch-25.02' into update_mg_negative_sampling

b082e88

ChuckHastings marked this pull request as ready for review January 23, 2025 00:52

ChuckHastings requested a review from a team as a code owner January 23, 2025 00:52

ChuckHastings self-assigned this Jan 23, 2025

github-actions bot added the cuGraph label Jan 23, 2025

ChuckHastings added improvement Improvement / enhancement to an existing function breaking Breaking change and removed cuGraph labels Jan 23, 2025

ChuckHastings added this to the 25.02 milestone Jan 23, 2025

seunghwak reviewed Jan 24, 2025

View reviewed changes

ChuckHastings added 2 commits January 24, 2025 13:04

update documentation

4f7344b

Merge branch 'branch-25.02' into update_mg_negative_sampling

21e822e

github-actions bot added the cuGraph label Jan 24, 2025

seunghwak reviewed Jan 28, 2025

View reviewed changes

ChuckHastings added 2 commits January 28, 2025 12:19

Merge branch 'branch-25.02' into update_mg_negative_sampling

14a99e3

address PR comments

0ace749

seunghwak approved these changes Jan 29, 2025

View reviewed changes

rapids-bot bot merged commit 5a41b41 into rapidsai:branch-25.02 Jan 30, 2025
79 checks passed

alexbarghi-nv added a commit to alexbarghi-nv/cugraph-gnn that referenced this pull request Jan 30, 2025

update negative sampling behavior to match rapidsai/cugraph#4885

725a460

alexbarghi-nv mentioned this pull request Jan 30, 2025

Heterogeneous Link Prediction Example for cuGraph-PyG rapidsai/cugraph-gnn#104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MG negative sampling to return random samples distributed as specified #4885

Update MG negative sampling to return random samples distributed as specified #4885

ChuckHastings commented Jan 23, 2025 •

edited

Loading

copy-pr-bot bot commented Jan 23, 2025

seunghwak left a comment

ChuckHastings commented Jan 24, 2025

alexbarghi-nv commented Jan 27, 2025

seunghwak Jan 28, 2025

ChuckHastings Jan 28, 2025

seunghwak Jan 28, 2025

ChuckHastings Jan 28, 2025

seunghwak Jan 29, 2025

seunghwak Jan 29, 2025

seunghwak Jan 28, 2025

ChuckHastings Jan 28, 2025

seunghwak left a comment

ChuckHastings commented Jan 29, 2025

alexbarghi-nv commented Jan 29, 2025

ChuckHastings commented Jan 29, 2025

Update MG negative sampling to return random samples distributed as specified #4885

Update MG negative sampling to return random samples distributed as specified #4885

Conversation

ChuckHastings commented Jan 23, 2025 • edited Loading

copy-pr-bot bot commented Jan 23, 2025

seunghwak left a comment

Choose a reason for hiding this comment

ChuckHastings commented Jan 24, 2025

alexbarghi-nv commented Jan 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

ChuckHastings commented Jan 29, 2025

alexbarghi-nv commented Jan 29, 2025

ChuckHastings commented Jan 29, 2025

ChuckHastings commented Jan 23, 2025 •

edited

Loading